Reducing over-clustering via the powered Chinese restaurant process

نویسندگان

  • Jun Lu
  • Meng Li
  • David Dunson
چکیده

Dirichlet process mixture (DPM) models tend to produce many small clusters regardless of whether they are needed to accurately characterize the data this is particularly true for large data sets. However, interpretability, parsimony, data storage and communication costs all are hampered by having overly many clusters. We propose a powered Chinese restaurant process to limit this kind of problem and penalize over clustering. The method is illustrated using some simulation examples and data with large and small sample size including MNIST and the Old Faithful Geyser data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chinese Restaurant Process for cognate clustering: A threshold free approach

In this paper, we introduce a threshold free approach, motivated from Chinese Restaurant Process, for the purpose of cognate clustering. We show that our approach yields similar results to a linguistically motivated cognate clustering system known as LexStat. Our Chinese Restaurant Process system is fast and does not require any threshold and can be applied to any language family of the world.

متن کامل

A Gibbs Sampler for Spatial Clustering with the Distance-dependent Chinese Restaurant Process

The distance-dependent Chinese Restaurant Process (dd-CRP) is a flexible class of distributions over partitions which was recently introduced by [1, 2]. In their description and experiments Blei and Frazier focus on the sequential setting such as clustering over time. Their Gibbs sampler, while general in nature, does not explicitly handle the case of non-sequential (also called spatial) cluste...

متن کامل

Tracklet clustering for robust multiple object tracking using distance dependent Chinese restaurant processes

To contrive an accurate and efficient strategy for object detection–object track assignment problem, we present a tracklet clustering approach using distance dependent Chinese restaurant processes (ddCRPs), which employ a two-level robust object tracker. The first level is an ordinary tracklet generator that obtains short yet reliable tracklets. In the second level, we cluster the tracklets ove...

متن کامل

Dynamic Non-Parametric Mixture Models and the Recurrent Chinese Restaurant Process: with Applications to Evolutionary Clustering

Clustering is an important data mining task for exploration and visualization of different data types like news stories, scientific publications, weblogs, etc. Due to the evolving nature of these data, evolutionary clustering, also known as dynamic clustering, has recently emerged to cope with the challenges of mining temporally smooth clusters over time. A good evolutionary clustering algorith...

متن کامل

Temporally-Reweighted Chinese Restaurant Process Mixtures for Clustering, Imputing, and Forecasting Multivariate Time Series

This article proposes a Bayesian nonparametric method for forecasting, imputation, and clustering in sparsely observed, multivariate time series. The method is appropriate for jointly modeling hundreds of time series with widely varying, non-stationary dynamics. Given a collection of N time series, the Bayesian model first partitions them into independent clusters using a Chinese restaurant pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1802.05392  شماره 

صفحات  -

تاریخ انتشار 2018